Sequential Weighting Algorithms for Multi-Alphabet Sources∗
نویسندگان
چکیده
The sequential Context Tree Weighting procedure [6] achieves the asymptotically optimal redundancy behavior of k/2 · (log n)/n, where k is the number of free parameters of the source and n is the sequence length. For FSMX sources with an alphabet A, this number, k, is (|A| − 1) · |Ka|. However, especially the FSMX sources often use only a few possible letters in every state. It would be nice if the linear term in the redundancy was determined by the number of letters actually used in the states in stead of the alphabet size. We shall discuss sequential methods achieving this redundancy behavior. First we describe a ‘natural’ use of the weighting principle on an appropriate class of sources and next we introduce a binary derived model where we use an appropriate estimator.
منابع مشابه
Count-Based Frequency Estimation with Bounded Memory
Count-based estimators are a fundamental building block of a number of powerful sequential prediction algorithms, including Context Tree Weighting and Prediction by Partial Matching. Keeping exact counts, however, typically results in a high memory overhead. In particular, when dealing with large alphabets the memory requirements of count-based estimators often become prohibitive. In this paper...
متن کاملSparse Sequential Dirichlet Coding
This short paper describes a simple coding technique, Sparse Sequential Dirichlet Coding, for multi-alphabet memoryless sources. It is appropriate in situations where only a small, unknown subset of the possible alphabet symbols can be expected to occur in any particular data sequence. We provide a competitive analysis which shows that the performance of Sparse Sequential Dirichlet Coding will ...
متن کاملImplementing the Context Tree Weighting Method for Text Compression
Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showed practical implementation using not block probabilities but conditional probabilities, it is use...
متن کاملSuperior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
We present worst case bounds for the learning rate of a known prediction method that is based on hierarchical applications of binary context tree weighting (CTW) predictors. A heuristic application of this approach that relies on Huffman’s alphabet decomposition is known to achieve state-ofthe-art performance in prediction and lossless compression benchmarks. We show that our new bound for this...
متن کاملOn-line string matching algorithms: survey and experimental results
In this paper we present a short survey and experimental results for well known sequential string matching algorithms. We consider algorithms based on different approaches including classical, suffix automata, bit-parallelism and hashing. We put special emphasis on algorithms recently presented such as Shift-Or and BNDM algorithms. We compare these algorithms in terms of the number of character...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993